Information Distance in Multiples

نویسنده

  • Paul M. B. Vitányi
چکیده

Information distance is a parameter-free similarity measure based on compression, used in pattern recognition, data mining, phylogeny, clustering, and classification. The notion of information distance is extended from pairs to multiples (finite lists). We study maximal overlap, metricity, universality, minimal overlap, additivity, and normalized information distance in multiples. We use the theoretical notion of Kolmogorov complexity which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

I-16: Assisted Reproduction and Multiple Gestation: What Are The Psychological Consequences

To describe the social and historical impact of ART on multiple gestations and to discuss psychological issues unique to these families. At the time of the birth of the Dionne quintuplets in 1934, only 33 cases of quintuplets had been reported in the literature and none of the quintuplets survived more than 50 days. Spontaneous higher order multiple gestation is still rare but the number of iat...

متن کامل

Notes for Math 345

Proof. First, we’ll give an intuitive proof. Let b > 0 be given. Mark off multiples of b on the number line. The integer a will fall between two different multiples. qb is the smaller multiple, and r is the distance from qb to a. Clearly the choices of q and r are unique. Why we need the well-ordering principle: it’s not clear that a will fall between two different multiples of b if the numbers...

متن کامل

Normalized Compression Distance of Multiples

Normalized compression distance (NCD) is a parameter-free similarity measure based on compression. The NCD between pairs of objects is not sufficient for all applications. We propose an NCD of finite multisets (multiples) of objacts that is metric and is better for many applications. Previously, attempts to obtain such an NCD failed. We use the theoretical notion of Kolmogorov complexity that f...

متن کامل

Random small Hamming weight products with applications to cryptography

There are many cryptographic constructions in which one uses a random power or multiple of an element in a group or a ring. We describe a fast method to compute random powers and multiples in certain important situations including powers in the Galois field F2n , multiples on Koblitz elliptic curves, and multiples in NTRU convolution polynomial rings. The underlying idea is to form a random exp...

متن کامل

CorrelatedMultiples: Spatially Coherent Small Multiples with Constrained Multidimensional Scaling

Small multiples are a popular method of summarizing and comparing multiple facets of complex data sets. Since they typically do not take into account correlations between items, serial inspection is needed to search and compare items, which can be ineffective. To address this, we introduce CorrelatedMultiples, an alternative of small multiples in which items are placed so that distances reflect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Information Theory

دوره 57  شماره 

صفحات  -

تاریخ انتشار 2011